[Feature] add traceback to error logs and optimize trace log#7608
[Feature] add traceback to error logs and optimize trace log#7608xyxinyang wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7608 +/- ##
==========================================
Coverage ? 72.31%
==========================================
Files ? 419
Lines ? 57907
Branches ? 9089
==========================================
Hits ? 41877
Misses ? 13175
Partials ? 2855
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览❌ 当前有 1 个 Required 任务失败,8 个 Required 任务运行中,需等待完成后评估。
2 任务状态汇总2.1 Required任务 : 1/10 通过
2.2 可选任务 — 23/26 通过
3 失败详情(仅 required)Approval — PR审批流程(置信度: 高)根因详情: 关键日志: 修复建议:
关联变更: PR 标题为 "add traceback to error logs and optimize trace log",大量新增 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-04-28 11:01:59
📋 Review 摘要
PR 概述:为 FastDeploy 错误日志系统性添加 traceback 信息(20+ 文件),新增 CACHE_HIT/CACHE_MISS trace 事件,并删除废弃的 FD_TRACE 环境变量
变更范围:cache_manager/、engine/、entrypoints/、envs.py、trace/constants.py
影响面 Tag:[KVCache] [Engine] [APIServer] [FDConfig]
📝 PR 规范检查
标题使用了官方合法 Tag [Feature],可接受;实质上 [Optimization] 更贴合本次变更(日志增强而非新功能),供作者参考。Checklist 中存在 2 个不适用项被勾选,按 D3 规则应直接删除。
标题建议(可直接复制):
[Optimization] add traceback to error logs and optimize trace log
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
针对 FastDeploy 的日志系统进行优化(系列第 4 个 PR):为 try 块中的错误日志添加 traceback 信息,确保异常时能看到完整调用栈;同时规范化 trace 日志事件,新增 Prefix Cache 命中/未命中事件。
## Modifications
1. **错误日志添加 traceback**:为 20+ 个文件中 try 块里的 `log_request_error` 和 `.error()` 调用添加 `traceback.format_exc()`,确保异常时能看到完整调用栈
2. **trace.log 新增 2 个缓存事件**(`constants.py`、`prefix_cache_manager.py`):
- `CACHE_HIT` - Prefix Cache 命中,可解释请求 TTFT 较快的原因(复用缓存跳过部分 Prefill)
- `CACHE_MISS` - Prefix Cache 未命中,可解释请求 TTFT 较慢的原因(需完整 Prefill)
3. **清理**:删除未使用的 `FD_TRACE` 环境变量(`envs.py`);移除 `metrics/trace.py` 中重复的 `print()` 调用
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | 无具体行号 | Checklist 中 [x] Provide accuracy results 和 [x] If the current PR is submitting to the release branch 两项不适用,应删除 |
| ❓ 疑问 | fastdeploy/cache_manager/prefix_cache_manager.py:1012 |
CACHE_HIT/CACHE_MISS trace 事件的 user 参数传入 "" 而非 getattr(task, "user", ""),与同文件其他调用不一致 |
总体评价
PR 整体质量良好,系统性地为 20+ 个文件的 except 块补充了 traceback 信息,提升了生产环境异常排查能力;新增的 CACHE_HIT/CACHE_MISS 事件填补了 trace 日志中 Prefix Cache 可观测性的空白,测试覆盖较全面。仅有 PR 规范格式和一处 user 参数一致性的小问题,不阻塞合入。
| if matched_block_num > 0: | ||
| self.metrics.hit_req_count += 1 | ||
| # Record CACHE_HIT trace event | ||
| trace_print(LoggingEventName.CACHE_HIT, req_id, "") |
There was a problem hiding this comment.
❓ 疑问 user 参数传入空字符串 "",与同文件其他 trace_print 调用(如 WRITE_CACHE_TO_STORAGE_START 使用 getattr(request, "user", ""))不一致。
此处 task 即 Request 对象,建议同步改为:
trace_print(LoggingEventName.CACHE_HIT, req_id, getattr(task, "user", ""))CACHE_MISS 那行同理。
Motivation
针对 FastDeploy 的日志系统进行优化,预计分 4 个 pr 完成。
Modifications
1. 错误日志添加 traceback
log_request_error和.error()调用添加traceback.format_exc()2. trace.log 优化
CACHE_HIT- Prefix Cache 命中,可解释请求 TTFT 较快的原因(复用缓存跳过部分 Prefill)CACHE_MISS- Prefix Cache 未命中,可解释请求 TTFT 较慢的原因(需完整 Prefill)3. 清理
FD_TRACE环境变量(envs.py)Usage or Command
用法没变
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.